Main
Joshua Goldberg
Senior applied scientist with expertise in generative AI, abuse prevention, and forecasting. Leads innovative projects at Amazon, combining statistical methods, mathematical modeling, and generative AI to automate complex analyses, uncover strategic insights, and drive impactful operational decisions. Demonstrated experience developing models to proactively detect fraud and accurately forecast global product demand at scale. Proven leader in education and mentorship, regularly instructing/assisting advanced data science and machine learning courses at the University of Chicago and Harvard, and mentoring junior and mid-level scientists.
Industry Experience
Sr. Applied Scientist
Amazon
Seattle, WA
Current - 2020
- Lead scientist in generative AI efforts in operations finance. Develop AI strategy and methodology for building and evaluating agentic systems. Mentor junior scientists on developing AI solutions.
- Employed statistical methods, mathematical decomposition techniques, and generative AI to automate analysis of outbound shipping costs, identify financial and operational deviations, and generate actionable insights directly leveraged for operational finance and financial planning. Created generative AI-driven analytical tools with interactive chat capabilities, enabling stakeholders to intuitively deep-dive into model results and financial reports.
- Led the development of machine learning and NLP models to proactively identify and mitigate fraudulent and abusive behaviors from third-party sellers, significantly enhancing the integrity of Amazon’s marketplace by preventing phishing, customer diversion, and related fraudulent activities.
- Designed and deployed forecasting models predicting demand for over 500,000 products globally, optimizing inventory management and customer satisfaction.
- Mentored and guided junior and mid-level scientists on statistical techniques, mathematical modeling, time series analysis, software engineering best practices, and career growth.
- Established monthly science reading groups and science office hours, fostering a collaborative environment and ensuring continuous engagement with the latest research and techniques.
- Managed end-to-end lifecycle of science solutions—from concept and model design through implementation and deployment on AWS cloud infrastructure (SageMaker, ECS, Lambda).
AVP, Lead Data Scientist
Nuveen
Chicago, IL
2020 - 2017
- Pioneered end-to-end (execution and experimental design) deep learning time series model for client onboarding; estimated impact of the model was $1 million net revenue annually that maximized client journey (improvement in client retention, client growth, etc.)
- Built recommendation engine for 150,000 clients in 50+ products
- Presented model/analysis to executive management; results included model adoption by 100+ sales people and a significant increase sales for clients treated by the model
- Conceptualized and created simulation engine that isolated, detected and measured the ROI impact of company sales events
Senior Equity Research Associate, Financial Services
Raymond James Financial, Inc.
Chicago, IL
2017 - 2014
- Built company and industry models using finance and statistical techniques, including regression and discounted cash flows (DCF)
Education
STEM Continuing Education
Stanford, Harvard, University of Washington
N/A
Current - 2021
- I actively take STEM courses at different universities to enhance, revisit, or refine my technical skillset. Course topics include computer science, mathematics, machine learning, and statistics.
- Standford: Convex Optimization 364a with Stephen Boyd (convexity, convex sets and functions, convex optimization problems (linear programs, quadratic programs, semidefinite programs), Lagrangian duality and KKT conditions, optimality conditions, interior-point methods, applications in signal processing, statistics, machine learning); Convex Optimization II 364b (nonsmooth optimization (subgradients, cutting-plane methods), decomposition methods (dual decomposition, ADMM), proximal methods, large-scale and distributed optimization, robust and stochastic optimization, convex relaxations of nonconvex problems, applications to machine learning and control)
- Harvard: Calculus 2 with Series and Differential Equations; Linear Algebra and Differential Equation; Real Analysis; Abstract Linear Algebra; Differential Equations; Systems Programming and Machine Organization (CS61)
- University of Washington: Probability l; Probability ll; Linear Optimization; Statistical Inference (STAT 512)
- Edmonds College: CS I, II, II; Courses in C/C++ covering Data structures & algorithms and object-oriented design and programming
- University of Illinois Urbana-Champaign (UIUC): Calculus 1: First course in Calculus and Analytic Geometry
M.S. in Applied Data Science
University of Chicago
Chicago, IL
2020
- Coursework in statistics, linear algebra, machine learning, and deep learning
B.S. in Accounting and Finance
University of South Florida
Tampa, FL
2013
Selected Code Repositories
Machine learning decision tree and data frame implementation in C++
Github
Seattle, WA
2021
- Authored with John Nguyen
Generative adversarial network used to generate musical samples
University of Chicago
Chicago, IL
2020
- Capstone project and paper authored with Terry Wang and Rima Mittal. Supervised by Yuri Balasanov
In my free time, I enjoy working with friends, peers, and colleagues on algorithm designs/implementations. One project involved building a data frame and decision tree classes in C++.
Teaching Experience
I am passionate about teaching and helping others. It brings me joy and satisfication to teach others new skills.
Linear Algebra & Real Analysis
Harvard Continuing Education, TA
Remote
Current - 2024
Machine learning, statistics
University of Chicago, TA
Remote
Current - 2020
- Intro statistics, machine learning, time series analysis
Python for Data Science
University of Chicago, Instructor
Remote
2024 - 2022
- Topics include introductory and advanced topics in python: variables, logical operators, containers, loops, conditionals, comprehensions, functions, object oriented (basics), advanced data analysis and manipulation with numpy and pandas, model evaluation, parallel computation, and APIs
MastersTrack Statistics for Machine Learning and Machine Learning Courses
University of Chicago, Instructor & TA
Remote
2022 - 2020
- Statistics course: topics include simple and multiple regression, logistic regression, hypothesis testing, variable transformations. Machine learning course: topics include a survey of machine learning algorithms: kNN, support vector machine, decision tree, random forest, boosted trees, and clustering algorithms
Data Understanding via SQL, Databases, and R
University of Chicago, Instructor & TA
Remote
2021 - 2020
- Topics include introduction to databases, mySQL, and R